Customizable Parallel Execution of Scientific Stream Queries
نویسندگان
چکیده
Scientific applications require processing highvolume on-line streams of numerical data from instruments and simulations. We present an extensible stream database system that allows scalable and flexible continuous queries on such streams. Application dependent streams and query functions are defined through an object-relational model. Distributed execution plans for continuous queries are described as high-level data flow distribution templates. Using a generic template we define two partitioning strategies for scalable parallel execution of expensive stream queries: window split and window distribute. Window split provides operators for parallel execution of query functions by reducing the size of stream data units using application dependent functions as parameters. By contrast, window distribute provides operators for customized distribution of entire data units without reducing their size. We evaluate these strategies for a typical high volume scientific stream application and show that window split is favorable when expensive queries are executed on limited resources, while window distribution is better otherwise.
منابع مشابه
Ivanova Scalable Scientific Stream Query Processing
Ivanova, M. 2005. Scalable Scientific Stream Query Processing. Acta Universitatis Upsaliensis. Uppsala Dissertations from the Faculty of Science and Technology 66. 137 pp. Uppsala. ISBN 91-554-6351-7 Scientific applications require processing of high-volume on-line streams of numerical data from instruments and simulations. In order to extract information and detect interesting patterns in thes...
متن کاملFramework for Querying Distributed Objects Managed by a Grid Infrastructure
Queries over scientific data often imply expensive analyses of data requiring a lot of computational resources available in Grids. We are developing a customizable query processor built on top of an established Grid infrastructure, the NorduGrid middleware, and have implemented a framework for managing long running queries in Grid environment. With the framework the user does not specify the de...
متن کاملStream Execution of Object Queries
We show a novel execution method of queries over structural data. We present the idea in detail on SBQL (a.k.a. AOQL)—a powerful language with clean semantics. SBQL stands for the Stack-Based Query Language. The stack used in its name and semantics is a heavy and centralised structure which makes parallel and stream processing unfeasible. We propose to process stack-based queries without a stac...
متن کاملScalable Parallelization of Expensive Continuous Queries over Massive Data Streams
Zeitler, E. 2011. Scalable Parallelization of Expensive Continuous Queries over Massive Data Streams. Acta Universitatis Upsaliensis. Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 836. 35 pp. Uppsala. ISBN 978-91-554-8095-0. Numerous applications in for example science, engineering, and financial analysis increasingly require online analysis...
متن کاملMassive Scale-out of Expensive Continuous Queries
Scalable execution of expensive continuous queries over massive data streams requires input streams to be split into parallel substreams. The query operators are continuously executed in parallel over these sub-streams. Stream splitting involves both partitioning and replication of incoming tuples, depending on how the continuous query is parallelized. We provide a stream splitting operator tha...
متن کامل